AITopics | sparse function

eb1bad7a84ef68a64f1afd6577725d45-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 04:41:45 GMT

artificial intelligence, deep learning, machine learning, (19 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Data Science (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

On the Complexity of Learning Sparse Functions with Statistical and Gradient Queries

Neural Information Processing SystemsMar-22-2026, 06:18:00 GMT

The goal of this paper is to investigate the complexity of gradient algorithms when learning sparse functions (juntas). We introduce a type of Statistical Queries ($\mathsf{SQ}$), which we call Differentiable Learning Queries ($\mathsf{DLQ}$), to model gradient queries on a specified loss with respect to an arbitrary model. We provide a tight characterization of the query complexity of $\mathsf{DLQ}$ for learning the support of a sparse function over generic product distributions. This complexity crucially depends on the loss function. For the squared loss, $\mathsf{DLQ}$ matches the complexity of Correlation Statistical Queries $(\mathsf{CSQ})$--potentially much worse than $\mathsf{SQ}$. But for other simple loss functions, including the $\ell_1$ loss, $\mathsf{DLQ}$ always achieves the same complexity as $\mathsf{SQ}$. We also provide evidence that $\mathsf{DLQ}$ can indeed capture learning with (stochastic) gradient descent by showing it correctly describes the complexity of learning with a two-layer neural network in the mean field regime and linear scaling.

artificial intelligence, machine learning, mathsf, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.60)

Add feedback

On the Complexity of Learning Sparse Functions with Statistical and Gradient Queries

Neural Information Processing SystemsFeb-17-2026, 19:02:20 GMT

"juntas" [Blum and Langley, 1997]), that is functions that depend only on a small number

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre: Research Report > Experimental Study (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.67)

Add feedback

eb1bad7a84ef68a64f1afd6577725d45-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 19:00:28 GMT

artificial intelligence, deep learning, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Data Science (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

The Multiple Quantile Graphical Model

Neural Information Processing SystemsNov-21-2025, 14:42:34 GMT

We introduce the Multiple Quantile Graphical Model (MQGM), which extends the neighborhood selection approach of Meinshausen and Buhlmann for learning sparse graphical models. The latter is defined by the basic subproblem of modeling the conditional mean of one variable as a sparse function of all others. Our approach models a set of conditional quantiles of one variable as a sparse function of all others, and hence offers a much richer, more expressive class of conditional distribution estimates. We establish that, under suitable regularity conditions, the MQGM identifies the exact conditional independencies with probability tending to one as the problem size grows, even outside of the usual homoskedastic Gaussian data model. We develop an efficient algorithm for fitting the MQGM using the alternating direction method of multipliers. We also describe a strategy for sampling from the joint distribution that underlies the MQGM estimate. Lastly, we present detailed experiments that demonstrate the flexibility and effectiveness of the MQGM in modeling hetereoskedastic non-Gaussian data.

multiple quantile graphical model, name change, proceedings, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.68)

Add feedback

On the Complexity of Learning Sparse Functions with Statistical and Gradient Queries

Neural Information Processing SystemsOct-10-2025, 14:51:54 GMT

"juntas" [Blum and Langley, 1997]), that is functions that depend only on a small number

algorithm, complexity, query, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre: Research Report > Experimental Study (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.67)

Add feedback

Position: A Theory of Deep Learning Must Include Compositional Sparsity

Danhofer, David A., D'Ascenzo, Davide, Dubach, Rafael, Poggio, Tomaso

arXiv.org Artificial IntelligenceJul-4-2025

Overparametrized Deep Neural Networks (DNNs) have demonstrated remarkable success in a wide variety of domains too high-dimensional for classical shallow networks subject to the curse of dimensionality. However, open questions about fundamental principles, that govern the learning dynamics of DNNs, remain. In this position paper we argue that it is the ability of DNNs to exploit the compositionally sparse structure of the target function driving their success. As such, DNNs can leverage the property that most practically relevant functions can be composed from a small set of constituent functions, each of which relies only on a low-dimensional subset of all inputs. We show that this property is shared by all efficiently Turing-computable functions and is therefore highly likely present in all current learning problems. While some promising theoretical insights on questions concerned with approximation and generalization exist in the setting of compositionally sparse functions, several important questions on the learnability and optimization of DNNs remain. Completing the picture of the role of compositional sparsity in deep learning is essential to a comprehensive theory of artificial, and even general, intelligence.

artificial intelligence, deep learning, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2507.0255

Country:

North America > United States (0.68)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Industry:

Education (0.49)
Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On the Complexity of Learning Sparse Functions with Statistical and Gradient Queries

Joshi, Nirmit, Misiakiewicz, Theodor, Srebro, Nathan

arXiv.org Artificial IntelligenceJul-8-2024

The goal of this paper is to investigate the complexity of gradient algorithms when learning sparse functions (juntas). We introduce a type of Statistical Queries ($\mathsf{SQ}$), which we call Differentiable Learning Queries ($\mathsf{DLQ}$), to model gradient queries on a specified loss with respect to an arbitrary model. We provide a tight characterization of the query complexity of $\mathsf{DLQ}$ for learning the support of a sparse function over generic product distributions. This complexity crucially depends on the loss function. For the squared loss, $\mathsf{DLQ}$ matches the complexity of Correlation Statistical Queries $(\mathsf{CSQ})$--potentially much worse than $\mathsf{SQ}$. But for other simple loss functions, including the $\ell_1$ loss, $\mathsf{DLQ}$ always achieves the same complexity as $\mathsf{SQ}$. We also provide evidence that $\mathsf{DLQ}$ can indeed capture learning with (stochastic) gradient descent by showing it correctly describes the complexity of learning with a two-layer neural network in the mean field regime and linear scaling.

algorithm, complexity, query, (17 more...)

arXiv.org Artificial Intelligence

2407.05622

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Theoretical Analysis of Inductive Biases in Deep Convolutional Networks

Wang, Zihao, Wu, Lei

arXiv.org Artificial IntelligenceMay-15-2023

In this paper, we study the inductive biases in convolutional neural networks (CNNs), which are believed to be vital drivers behind CNNs' exceptional performance on vision-like tasks. We first analyze the universality of CNNs, i.e., the ability to approximate continuous functions. We prove that a depth of $\mathcal{O}(\log d)$ is sufficient for achieving universality, where $d$ is the input dimension. This is a significant improvement over existing results that required a depth of $\Omega(d)$. We also prove that learning sparse functions with CNNs needs only $\tilde{\mathcal{O}}(\log^2d)$ samples, indicating that deep CNNs can efficiently capture long-range sparse correlations. Note that all these are achieved through a novel combination of increased network depth and the utilization of multichanneling and downsampling. Lastly, we study the inductive biases of weight sharing and locality through the lens of symmetry. To separate two biases, we introduce locally-connected networks (LCNs), which can be viewed as CNNs without weight sharing. Specifically, we compare the performance of CNNs, LCNs, and fully-connected networks (FCNs) on a simple regression task. We prove that LCNs require ${\Omega}(d)$ samples while CNNs need only $\tilde{\mathcal{O}}(\log^2d)$ samples, which highlights the cruciality of weight sharing. We also prove that FCNs require $\Omega(d^2)$ samples while LCNs need only $\tilde{\mathcal{O}}(d)$ samples, demonstrating the importance of locality. These provable separations quantify the difference between the two biases, and our major observation behind is that weight sharing and locality break different symmetries in the learning process.

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2305.08404

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks

Abbe, Emmanuel, Boix-Adsera, Enric, Misiakiewicz, Theodor

arXiv.org Machine LearningFeb-17-2022

It is currently known how to characterize functions that neural networks can learn with SGD for two extremal parameterizations: neural networks in the linear regime, and neural networks with no structural constraints. However, for the main parametrization of interest (non-linear but regular networks) no tight characterization has yet been achieved, despite significant developments. We take a step in this direction by considering depth-2 neural networks trained by SGD in the mean-field regime. We consider functions on binary inputs that depend on a latent low-dimensional subspace (i.e., small number of coordinates). This regime is of interest since it is poorly understood how neural networks routinely tackle high-dimensional datasets and adapt to latent low-dimensional structure without suffering from the curse of dimensionality. Accordingly, we study SGD-learnability with $O(d)$ sample complexity in a large ambient dimension $d$. Our main results characterize a hierarchical property, the "merged-staircase property", that is both necessary and nearly sufficient for learning in this setting. We further show that non-linear training is necessary: for this class of functions, linear methods on any feature map (e.g., the NTK) are not capable of learning efficiently. The key tools are a new "dimension-free" dynamics approximation result that applies to functions defined on a latent space of low-dimension, a proof of global convergence based on polynomial identity testing, and an improvement of lower bounds against linear methods for non-almost orthogonal functions.

artificial intelligence, machine learning, merged-staircase property, (4 more...)

arXiv.org Machine Learning

2202.08658

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Filters

Collaborating Authors

sparse function

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

eb1bad7a84ef68a64f1afd6577725d45-Paper-Conference.pdf

On the Complexity of Learning Sparse Functions with Statistical and Gradient Queries

On the Complexity of Learning Sparse Functions with Statistical and Gradient Queries

eb1bad7a84ef68a64f1afd6577725d45-Paper-Conference.pdf

The Multiple Quantile Graphical Model

On the Complexity of Learning Sparse Functions with Statistical and Gradient Queries

Position: A Theory of Deep Learning Must Include Compositional Sparsity

On the Complexity of Learning Sparse Functions with Statistical and Gradient Queries

Theoretical Analysis of Inductive Biases in Deep Convolutional Networks

The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks